Skip to content

Kd vllm generation#5351

Merged
cmpatino merged 9 commits into
huggingface:mainfrom
cmpatino:kd-vllm-generation
Mar 26, 2026
Merged

Kd vllm generation#5351
cmpatino merged 9 commits into
huggingface:mainfrom
cmpatino:kd-vllm-generation

Conversation

@cmpatino

@cmpatino cmpatino commented Mar 23, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Addresses the comment from #5137 to use trl.generation.VLLMGeneration instead of the separate vLLM logic.

Before submitting

  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


Note

Medium Risk
Moderate risk because it rewires GOLD on-policy generation, prompt/attention masking, and weight-sync behavior for vLLM, which can affect training correctness and stability in distributed setups.

Overview
GOLD’s vLLM path is refactored to use trl.generation.VLLMGeneration instead of trainer-local server/colocate logic, removing the custom callback/weight-update code and centralizing sync + generation behind sync_weights()/generate().

On-policy buffering is updated to pass prompt token IDs directly (respecting prompt_attention_mask), build concatenated attention_mask explicitly, and generate labels only on completion tokens that are not padded. GOLDConfig gains new vLLM knobs (vllm_server_base_url, vllm_group_port, vllm_max_model_length, vllm_model_impl), and tests are updated/added to lock in these behaviors (prompt masking for vLLM prompts, special-token reconstruction, and defaulting vllm_max_model_length to max_length).

Written by Cursor Bugbot for commit dcfce59. This will update automatically on new commits. Configure here.

@cmpatino cmpatino marked this pull request as ready for review March 23, 2026 12:15
Comment thread trl/experimental/gold/gold_trainer.py
Comment thread trl/experimental/gold/gold_trainer.py
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread trl/experimental/gold/gold_trainer.py
Comment thread trl/experimental/gold/gold_trainer.py Outdated

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

prompts=prompt_ids_list,
images=None,
num_generations=self.num_generations,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incompatible prompt handling for multi-generation vLLM usage

Medium Severity

VLLMGeneration.generate assumes prompts are pre-duplicated num_generations times (as GRPO does via its RepeatSampler), but GOLD passes each prompt exactly once. In server mode, the method does all_prompts[::num_generations] to "deduplicate," which skips prompts when num_generations > 1. In colocate mode, n=1 is hardcoded, so extra generations are silently ignored. The old code had custom _generate_vllm_server_global / _generate_vllm_colocate methods that correctly handled n=num_generations without requiring pre-duplication. This is a regression when num_generations > 1 (default is 1, so typical usage is unaffected).

Fix in Cursor Fix in Web

@cmpatino cmpatino merged commit 05eac2c into huggingface:main Mar 26, 2026
4 checks passed
@cmpatino cmpatino deleted the kd-vllm-generation branch March 26, 2026 09:37
@cmpatino cmpatino restored the kd-vllm-generation branch March 26, 2026 09:58
@cmpatino cmpatino deleted the kd-vllm-generation branch March 26, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants